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Abstract 

Permutation codes are a class of structured vector quantizers with a computationally-simple encoding procedure 
based on sorting the scalar components. Using a codebook comprising several permutation codes as subcodes 
preserves the simplicity of encoding while increasing the number of rate-distortion operating points, improving the 
convex hull of operating points, and increasing design complexity. We show that when the subcodes are designed 
with the same composition, optimization of the codebook reduces to a lower-dimensional vector quantizer design 
within a single cone. Heuristics for reducing design complexity are presented, including an optimization of the rate 
allocation in a shape-gain vector quantizer with gain-dependent wrapped spherical shape codebook. 

Index Terms 

Gaussian source, group codes, integer partitions, order statistics, permutation codes, rate allocation, source 
coding, spherical codes, vector quantization 

I. Introduction 

A permutation source code HI, 121 places all codewords on a single sphere by using the permutations of an initial 
codeword. The size of the codebook is determined by multiplicities of repeated entries in the initial codeword, and 
the complexity of optimal encoding is low. In the limit of large vector dimension, an optimal permutation code 
for a memory less source performs as well as entropy-constrained scalar quantization |3]. This could be deemed a 
disappointment because the constraint of placing all codewords on a single sphere does not preclude performance 
approaching the rate-distortion bound when coding a memoryless Gaussian source [4]. An advantage that remains 
is that the fixed-rate output of the permutation source code avoids the possibility of buffer overflow associated with 
entropy coding highly nonequiprobable outputs of a quantizer [5|. 

The performance gap between permutation codes and optimal spherical codes, along with the knowledge that the 
performance of permutation codes does not improve monotonically with increasing vector dimension |6|, motivates 
the present paper. We consider generalizing permutation source codes to have more than one initial codeword. 
While adding very little to the encoding complexity, this makes the codebook of the vector quantizer (VQ) lie 
in the union of concentric spheres rather than in a single sphere. Our use of multiple spheres is similar to the 
wrapped spherical shape-gain vector quantization of Hamkins and Zeger |7]; one of our results, which may be of 
independent of interest, is an optimal rate allocation for that technique. Our use of permutations could be replaced 
by the action of other groups to obtain further generalizations [8|. 

Design of a permutation source code includes selection of the multiplicities in the initial codeword; these 
multiplicities form a composition of the vector dimension f9', Ch. 5]. The generalization makes the design problem 
more difficult because there is a composition associated with each initial codeword. Our primary focus is on 
methods for reducing the design complexity. We demonstrate the effectiveness of these methods and improvements 
over ordinary permutation source codes through simulations. 

The use of multiple initial codewords was introduced as "composite permutation coding" by Lu et al. |[TOl . ifTTl 
and applied to speech/audio coding by Abe et al. |12J. These previous works restrict the constituent permutation 
source codes to have the same number of codewords, neglect the design of compositions, and use an iterative VQ 
design algorithm at the full vector dimension. In contrast, we allow the compositions to be identical or different, 
thus allowing the sizes of subcodes to differ. In the case of a single, common composition, we show that a reduced- 
dimension VQ design problem arises. For the general case, we provide a rate allocation across subcodes. 

This material is based upon work supported by the National Science Foundation under Grant No. 0729069. This work was also supported 
in part by a Vietnam Education Foundation Fellowship. 

This work was presented in part at the IEEE International Symposium on Information Theory, Seoul, South Korea, June-July 2009. 
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Massachusetts Institute of Technology, Cambridge, MA 02139 USA. 
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The generalization that we study maintains the low O(nlogn) encoding complexity for vectors of dimension 
n that permutation source codes achieve. Vector permutation codes are a different generalization with improved 
performance Iil3il . Their encoding procedure, however, requires solving the assignment problem in combinatorial 
optimization |[T4l and has complexity 0{n'^ ^/n log n). 

The paper is organized as follows: We review the attainment of the rate-distortion bound by spherical source 
codes and the basic formulation of permutation coding in Section JI] Section HID introduces concentric permutation 
codes and discusses the difficulty of their optimization. One simplification that reduces the design complexity — 
the use of a single common composition for all initial codewords — is discussed in Section JV] The use of a 
common composition obviates the issue of allocating rate amongst concentric spheres of codewords. Section IVl 
returns to the general case, with compositions that are not necessarily identical. We develop fixed- and variable-rate 
generalizations of wrapped spherical shape-gain vector quantization for the purpose of guiding the rate allocation 
problem. Concluding comments appear in Section [Vll 

II. Background 

Let X G be a random vector with independent J\f{0,a'^) components. We wish to approximate X with a 
codeword X drawn from a finite codebook C. We want small per-component mean-squared error (MSB) distortion 
D = n~^E[\\X — when the approximation X is represented with nR bits. In the absence of entropy coding, 
this means the codebook has size 2"^. For a given codebook, the distortion is minimized when X is the codeword 
closest to X. 



A. Spherical Codes 

In a spherical (source) code, all codewords lie on a single sphere in M". Nearest-neighbor encoding with such a 
codebook partitions M" into 2"^ cells that are (unbounded) convex cones with apexes at the origin. In other words, 
the representations of X and aX are the same for any scalar q > 0. Thus a spherical code essentially ignores 
||X||, placing all codewords at radius 



E\\\X\ 



\/27rcr2 
P{n/2, 1/2) 



a^yn- 1/2, 



where /3(-, •) is the beta function, while representing X/||X|| with nR bits. 

Sakrison first analyzed the performance of spherical codes for memoryless Gaussian sources. Following lH, 
Q, the distortion can be decomposed as 
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The first term is the distortion between the projection of X to the code sphere and its representation on the sphere, 
and the second term is the distortion incurred from the projection. The second term vanishes as n increases even 
though no bits are spent to convey the norm of X. Placing codewords uniformly at random on the sphere controls 
the first term sufficiently for achieving the rate-distortion bound as n — )• oo. 



B. Permutation Codes 

1) Definition and Encoding: A permutation code (PC) is a special spherical code in which all the codewords 
are related by permutation. Permutation channel codes were introduced by Slepian [15] and modified through the 
duality between source encoding and channel decoding by Dunn UJ- They were then developed by Berger et al. IS, 

m, HE. 

There are two variants of permutation codes: 

Variant I: Let fii > fi2 > ■ ■ ■ > hk be real numbers, and let ni, n2, . . . , nx be positive integers with sum equal 
to n (an (ordered) composition of n). The initial codeword of the codebook C has the form 



iinit = {fJ-1, . . . ,^l,/i2, • • • ,/^2, • • • • • • ,fJ-K), 



(2) 
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where each /ij appears rii times. The codebook is the set of all distinct permutations of Xinit- The number of 
codewords in C is thus given by the multinomial coefficient 

n\ 

M = ^^ r (3) 

nil 722! • • • UK- 

The permutation structure of the codebook enables low-complexity nearest-neighbor encoding |2 |: map X to the 
codeword X whose components have the same order as X; in other words, replace the ni largest components of 
X with /ii, the n2 next-largest components of X with /i2, and so on. 

Variant II: The initial codeword Xinit still has the form dD, but now all its entries are nonnegative; i.e., //i > 
/i2 > • • • > fJ-K > 0. The codebook now consists of all possible permutations of Xinit in which each nonzero 
component is possibly negated. The number of codewords is thus given by 

M = 2^- —— -, (4) 

Til! 722' • • • riK- 

where h is the number of positive components in Xjnit. Optimal encoding is again simple f^^: map X to the codeword 
X whose components have the same order in absolute value and match the signs of corresponding components of 
X. 

Since the complexity of sorting is 0(77, log ri) operations, the encoding complexity is much lower than with an 
unstructured VQ and only 0(logr7) times higher than scalar quantization. 

2) Performance and Optimization: For i.i.d. sources, each codeword is chosen with equal probability. Conse- 
quently, there is no improvement from entropy coding and the per-letter rate is simply R = n^^ logilf. 

Let ^1 > ^2 > • • • > denote the order statistics of random vector X = {Xi, . . . , Xn), and r]i > ri2 > ■ ■ ■ > rjn 
denote the order statistics of random vector |X| = (|Xi|, . . . , |X„|)Q With these notations and an initial codeword 
given by (O, the per-letter distortion of optimally-encoded Variant I and Variant II codes can be deduced simply 
by realizing which order statistics are mapped to each element of Xinit: 



and (5) 

(6) 



Di = n~^E 
Dii = n-^E 

where /jS are the groups of indices generated by the composition, i.e., 

A = {1, 2, ... , 771}, = { (E^i n^) + 1, . . . , (E™=i nm)],i>2. 

Given a composition (77-1,772, • • • jTlk), minimization of Di or Dn can be done separately for each /Xj, yielding 
optimal values 

^J'i = Y^eei, ^ [^^] ' Variant I, and (7) 

fJ-i = nr^ T^iei, ^ iVi] , for Variant II. (8) 

Overall minimization of Di or Du over the choice of K, {njj^j^, and {/Uj}^i subject to a rate constraint is 
difficult because of the integer constraint of the composition. 

The analysis of Q shows that as n grows large, the composition can be designed to give performance equal to 
optimal entropy-constrained scalar quantization (ECSQ) of X. Heuristically, it seems that for large block lengths, 
PCs suffer because there are too many permutations (77^^ log2 nl grows) and the vanishing fraction that are chosen 
to meet a rate constraint do not form a good code. The technique we study in this paper is for moderate values of 
77, for which the second term of ([T) is not negligible; thus, it is not adequate to place all codewords on a single 
sphere. 

III. Permutation Codes with Multiple Initial Codewords 

In this paper, we generalize ordinary PCs by allowing multiple initial codewords. The resulting codebook is 
contained in a set of concentric spheres. 



'Because of the convention /it > fii+i established by Berger et al. [2], it is natural to index the order statistics in descending order as 
shown, which is opposite to the ascending convention in the order statistics literature II7I . 
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A. Basic Construction 



Let J be a positive integer. We will define a concentric permutation (source) code (CPC) with J initial codewords. 
This is equivalent to having a codebook that is the union of J PCs. Each notation from Section III-BI is extended 
with a superscript or subscript j G {1, 2, . . . , J} that indexes the constituent PC. Thus, Cj is the subcodebook of 
full codebook C = Uj^iCj consisting of aU Mj distinct permutations of initial vector 



^init 



■ Ml, 



(9) 



where each fij appears times, fi\ > > ■ ■ ■ > /i]^ (all of which are nonnegative for Variant II), and 

^i^^i n-i = n. Also, {//}^\ are sets of indices generated by the jth composition. 

Proposition 1: Nearest-neighbor encoding of X with codebook C can be accomplished with the following 
procedure: 

1) For each j, find Xj G Cj whose components have the same order as X. 

2) Encode X with X, the nearest codeword amongst {Xj}^^^. 

Proof: Suppose X' £ C is an arbitrary codeword. Since C = U^^^Cj, there must exist jo € {1, 2, . . . , J} such 
that X' G Cj„. We have 



(a) 
< 



IX -X 



Jo I 



(&) 

< 



IX - x'l 



where (a) follows from the second step of the algorithm, and (b) follows from the first step and the optimality of 
the encoding for ordinary PCs. ■ 

The first step of the algorithm requires 0(n log n) + 0{Jn) operations (sorting components of X and reordering 
each xf^^^ according to the index matrix obtained from the sorting); the second step requires 0{Jn) operations. The 
total complexity of encoding is therefore 0(n log n), provided that we keep J = O(logn). In fact, in this rough 
accounting, the encoding with J = O(logn) is as cheap as the encoding for ordinary PCs. 

For i.i.d. sources, codewords within a subcodebook are approximately equally likely to be chosen, but codewords 
in different subcodebooks may have very different probabilities. Using entropy coding yields 



R 



n 



-1 



PjlogMj 



(10) 



where H{-) denotes the entropy of a distribution, pj is the probability of choosing subcodebook Cj, and Mj is the 
number of codewords in Cj. Note that (ITOl ) is suggestive of a two-stage encoding scheme with a variable-rate code 
for the index of the chosen subcodebook and a fixed-rate code for the index of the chosen codeword within the 
subcodebook. Without entropy coding, the rate is 



R = n-^\og (e/=iM,). 



The per-letter distortion for Variant I codes is now given by 



D = n'^E 



mill ||X — Xj 

i<j<J 



•\ 2 



(11) 



(12) 



where (fT2l ) is obtained by rearranging the components of X and Xj in descending order. The distortion for Variant II 
codes has the same form as (fT2l) with {^^} replaced by {rji}. 



B. Optimization 

In general, finding the best ordinary PC requires an exhaustive search over all compositions of n. (Assuming a 
precomputation of all the order statistic means, the computation of the distortion for a given composition through 
either dS) or Q is simple lH.) The search space can be reduced for certain distributions of X using E Thm. 3], 
but seeking the optimal code still quickly becomes intractable as n increases. 

Our generalization makes the design problem considerably more difficult. Not only do we need J compositions, 
but the distortion for a given composition is not as easy to compute. Because of the minimization over j in (fT2]) . 
we lack a simple expression for ^^s in terms of the composition and the order statistic means as given in d?]). The 
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relevant means are of conditional order statistics, conditioned on which subcodebook is selected; this depends on 
all J compositions. 

In the remainder of the paper, we consider two ways to reduce the design complexity. In Section |IVl we fix 
all subcodebooks to have a common composition. Along with reducing the design space, this restriction induces 
a structure in the fuU codebook that enables the joint design of {/i^}^^^ for any i. In Section |Vl we take a brief 
detour into the optimal rate allocations in a wrapped spherical shape-gain vector quantizer with gain-dependent 
shape codebook. We use these rate allocations to pick the sizes of subcodebooks {Cj}^^^. 

The simplifications presented here still leave high design complexity for large n. Thus, some simulations use 
complexity -reducing heuristics including our conjecture that an analogue to |2, Thm. 3] holds. Since our numerical 
designs are not provably optimal, the improvements from allowing multiple initial codewords could be somewhat 
larger than we demonstrate. 



IV. Design with Common Composition 

In this section, assume that the J compositions are identical, i.e., the njs have no dependence on j. The 
subcodebook sizes are also equal, and dropping unnecessary sub- and superscripts we write the common composition 
as {ui}^^ and the size of a single subcodebook as M. 



A. Common Compositions Give Common Conic Partitions 

The Voronoi regions of the code now have a special geometric structure. Recall that any spherical code partitions 
into (unbounded) convex cones. Having a common composition implies that each subcodebook induces the 
same conic Voronoi structure on M". The full code divides each of the M cones into J Voronoi regions. 

The following theorem precisely maps the encoding of a CPC to a vector quantization problem. For compositions 
other than (1,1,. ..,1), the VQ design problem is in a dimension strictly lower than n. 

Theorem 1: For fixed common composition (ni,n2, . . . ^uk), the initial codewords 
{(/i{, . . . , /i-J, . . . , /ij^, . . . , of a Variant I CPC are optimal if and only if {/i^, . . . , /i"^} are representation 

points of the optimal J-point vector quantization of ^ € ffi^ , where 



n2 



/UK 



/UK 



l<j<J, 



Proof: Rewrite the distortion as follows: 

K 



nD 
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i<i<J 




(13) 



Since the second and third terms of ([T3] ) do not depend on {xj^j^jj^]^, minimizing D is equivalent to minimizing 
the first term of ([T3] ). By definition of a A'-dimensional VQ, that term is minimized if and only if {/i^, . . . j/x"^} 
are optimal representation points of the J-point VQ of random vector ^, completing the proof. ■ 
For any fixed composition, one can implement the J-point VQ design inspired by Theorem [U using the Lloyd- 
Max algorithm ITSl . flQl . to obtain {^^ , . . . , n'^} C and then apply the mapping stated in the theorem to obtain 
the J desired initial codewords in M". Theorem [T] can be trivially extended for Variant II codes by simply replacing 
{ii} with {%}. 

Figure [U compares the performance of an ordinary Variant I PC (J = 1) with variable-rate CPCs with J = 3 
initial vectors. For a given composition, the distortion of the optimal ordinary PC is computed using ^ and 
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Rate 



Fig. 1. Rate-distortion performance for variable-rate coding of i.i.d. J\f{0, 1) source with block length n = 7. Ordinary Variant I PCs 
(J — 1) are compared with CPCs with J = 3. Codes with common compositions are designed according to Theorem [T] Codes with 
different compositions are designed with heuristic selection of compositions guided by Conjecture |2] and Algorithm [T] For clarity, amongst 
approximately-equal rates, only operational points with the lowest distortion are plotted. 



variances of the order statistics (see fD. Eq. (13)]), whereas that of the optimal CPC is estimated empirically from 
500000 samples generated according to the M{0, 1) distribution. Figure [T] and several subsequent figures include for 
comparison the rate-distortion bound and the performances of two types of entropy-constrained scalar quantization: 
uniform thresholds with uniform codewords (labeled ECUSQ) and uniform thresholds with optimal codewords 
(labeled ECSQ). At all rates, the latter is a very close approximation to optimal ECSQ; in particular, it has optimal 
rate-distortion slope at rate zero |[20l . 



B. Optimization of Composition 

Although the optimization of compositions is not easy even for ordinary PCs, for a certain class of distributions, 
there is a useful necessary condition for the optimal composition [2, Thm. 3]. The following conjecture is an 
analogue of that condition. 

Conjecture 1: Suppose that J > 1 and that E[7]g] is a convex function of £, i.e. 

E [rje+2] - 2 E + E [r,e] > 0, 1 < £ < n - 2. (14) 

Then the optimum Ui for Variant II CPCs increases monotonically with i. 

The convexity of E[r]i] holds for a large class of source distributions (see fT), Thm. 4]), including Gaussian ones. 
Conjecture [T] greatly reduces the search space for optimal compositions for such sources. 

The conjecture is proven if one can show that the distortion associated with the composition (ni, . . . , n^, nm+i, . . . , nx), 
where rim > Um+i, can be decreased by reversing the roles of rim and rim+i- As a plausibility argument for the 
conjecture, we will show that the reversing has the desired property when an additional constraint is imposed on 
the codewords. With the composition fixed, let 

^ L+r 2 ^+1 I L+q+r 

C = -y^m y] m + - y] (15) 

L+1 ^ L+r+1 L+q+l 

where L = ni + n2 + ■ ■ ■ + rim-i- The convexity of E[r]£] implies the nonnegativity of E[(^] (see fT. Thm. 2]). 
Using the total expectation theorem, E[(^] can be written as the difference of two nonnegative terms, 

C+ = Pr(C > 0)E[C IC > 0] and C- = -Pi-(C < 0)E[C |C < 0]. 
Since E[(] > and probabilities are nonnegative, it is clear that (_^_ > Therefore, the following set is non-empty: 
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With the notations above, we are now ready to state the proposition. If the restriction of the codewords were known 
to not preclude optimality, then Conjecture [T] would be proven. 

Proposition 2: Suppose that J > 1 and E[r](\ is a convex function of £. If > n^+i for some m, and the 
constraint given in ([T6l ) is imposed on the codewords, then the distortion associated with the composition 
(ni, . . . , rim, f^m+i, ■ ■ ■ , nx) can be decreased by reversing the roles of and rim+i- 

Proof: See Appendix lAl ■ 

A straightforward extension of Conjecture [T] for Variant I codes is the following: 

Conjecture 2: Suppose that J > 1, and that E[^i] is convex over Si = {1,2, ... , [A'/2j} and concave over 
52 = {[K/2J + 1, [i^/2j + 2, . . . ,K}. Then the optimum m for Variant I CPCs increases monotonically with 
i G 5i and decreases monotonically with i & 82- 

The convexity of E[^(\ holds for a large class of source distributions (see fl. Thm. 5]). We will later restrict the 
compositions, while doing simulations for Variant I codes and Gaussian sources, to satisfy Conjecture |2] 



V. Design with Different Compositions 

Suppose now that the compositionss of subcodebooks can be different. The Voronoi partitioning of M" is much 
more complicated, lacking the separability discussed in the previous sectionjl Furthermore, the apparent design 
complexity for the compositions is increased greatly to equal the number of compositions raised to the Jth power, 
namely 

In this section we first outline an algorithm for local optimization of initial vectors with all the compositions 
fixed. Then we address a portion of the composition design problem which is the sizing of the subcodebooks. For 
this, we extend the high-resolution analysis of [7 1. For brevity, we limit our discussion to Variant I CPCs; Variant II 
could be generalized similarly. 



A. Local Optimization of Initial Vectors 

Let ^ = (^1,^2, • • • ,in) denote the ordered vector of X. Given J initial codewords {^ii^j^jj^X' ^'^^ QSiCh j, let 
Rj C M" denote the quantization region of ^ corresponding to codeword xf^^^, and let Ej[-] denote the expectation 
conditioned on ^ G Rj. If Rj is fixed, consider the distortion conditioned on ^ G Rj 



Dj = n-^E 



(17) 



By extension of an argument in @, Dj is minimized with 



1 

TV- 



For a given set since the total distortion is determined by 



J' 



it will decrease if are set to the new values given by ([TSl l for all 1 < j < J and for all 1 < i < Kj. 

From the above analysis, a Lloyd algorithm can be developed to design initial codewords as given in Algorithm [T] 
This algorithm is similar to the algorithm in fTOl, but here the compositions can be arbitrary. Algorithm [T] was used 
to produce the operating points shown in Figure [T] for CPCs with different compositions in which the distortion of a 
locally-optimal code was computed empirically from 500000 samples generated according to A/'(0, 1) distribution. 
We can see through the figure that common compositions can produce almost the same distortion as possibly- 
different compositions for the same rate. However, allowing the compositions to be different yields many more 
rates. The number of rates is explored in Appendix |B] 



^For a related two-dimensional visualization, compare 1211 Fig. 3] against 1211 Figs. 7-13]. 
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Algorithm 1 Lloyd Algorithm for Initial Codeword Optimization from Given Composition 



1) Order vector X to get ^ 

2) Choose an arbitrary initial set of J representation vectors xf^^^ , xf^^^ , . . . , x; 

3) For each j, determine the corresponding quantization region Rj of ^. 

4) For each j, set xf^^^ to the new value given by dTSl ). 

5) Repeat steps 3 and 4 until further improvement in MSE is negligible. 



B. Wrapped Spherical Shape-Gain Vector Quantization 

Hamkins and Zeger 17| introduced a type of spherical code for M" where a lattice in R'^-i is ' 'wrapped" around 
the code sphere. They applied the wrapped spherical code (WSC) to the shape component in a shape-gain vector 
quantizer. 

We generalize this construction to allow the size of the shape codebook to depend on the gain. Along this line of 
thinking, Hamkins [22. pp. 102-104] provided an algorithm to optimize the number of codewords on each sphere. 
However, neither analytic nor experimental improvement was demonstrated. In contrast, our approach based on 
high-resolution optimization gives an explicit expression for the improvement in signal-to-noise ratio (SNR). While 
our results may be of independent interest, our present purpose is to guide the selection of {Mj}j^^ in CPCs. 

A shape-gain vector quantizer (VQ) decomposes a source vector X into a gain g = \\X\\ and a shape S = X/g, 
which are quantized to g and S, respectively, and the approximation is X = g ■ S. We optimize here a wrapped 
spherical VQ with gain-dependent shape codebook. The gain codebook, {gi,g2, ■ ■ ■ ,gj}, is optimized for the gain 
pdf, e.g., using the scalar Lloyd-Max algorithm fTSl . |fT9l . For each gain codeword gj, a shape subcodebook is 
generated by wrapping the sphere packing A C M"~^ on to the unit sphere in R". The same A is used for each j, 
but the density (or scaling) of the packing may vary with j. Thus the normalized second moment G{A) applies for 
each j while minimum distance d-^ depends on the quantized gain gj. We denote such a sphere packing as (A, d-^). 

The per-letter MSE distortion will be 



D 



n 



\X-gS\ 



[\\X - g Sf] + 2n-'E \{X - g Sf{g S - g S)] + n'^E \\\g S-gS 



n-^E \\\X -gSW^] +n-^E 



\gS - gS\ 



where the omitted cross term is zero due to the independence of g and g from S 171. The gain distortion, Dg, is 
given by 



1 



n 



(r - g{r)f fg{r) dr, 



where g{-) is the quantized gain and fg{-) is the pdf of g. 

Conditioned on the gain codeword gj chosen, the shape S is distributed uniformly on the unit sphere in M", 
which has surface area Sn = 27r"'/^/r(n/2). Thus, as shown in 171, for asymptotically high shape rate Rs, the 
conditional distortion i^'[||-S' — | gj] is equal to the distortion of the lattice quantizer with codebook (A, d]^) for 
a uniform source in M"^^. Thus, 



E 



\S-S\\ 



9j 



(n- l)G(A)yj-(A)2/('^-i), 



(19) 



where Vj{A) is the volume of a Voronoi region of the (n — 1) -dimensional lattice (A, d-J^^). Therefore, for a given 
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gain codebook {gi,g2, ■ ■ ■ ,gj}, the shape distortion Dg can be approximated by 

J 



1 



-E 



n 



gS-gS\\^ =^^PjgjE \\S - Sf \ g = gj 



W 1 
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W I 
n 



J 

j;p,5|(n-l)G(A)y,(A)2/(«-i) 
J 



Y,Pj9%n-l)G{K){SnlM, 



n — 1 

71 



^2/{n-l) 



'2/(n-l) 



where pj is the probabiUty of gj being chosen; (a) follows from ( fT9l ); (b) follows from the high-rate assumption 
and neglecting the overlapping regions, with Mj representing the number of codewords in the shape subcodebook 
associated with gj ; and 



C 



A — 1 



n 



G(A)f2WVr(n/2) 



2/(n-l) 



(20) 



C. Rate Allocations 



The optimal rate allocation for high-resolution approximation to WSC given below will be used as the rate 
allocation across subcodebooks in our CPCs. 

1 ) Variable-Rate Coding: Before stating the theorem, we need the following lemma. 



Lemma 1: If there exist constants Cg and Cg such that 

lim Dg ■ 22(«/("-i))«= = Cs and 



lim Z)g-22"^« 



c. 



9' 



then the minimum oi D = 0^ + Dg subject to the constraint R = Rg + Rg satisfies 



lim D2^^ 

and is achieved by Rg = R* and Rg = R* where 



n 



R* 
Rl 
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R+—log{ ^- 
2n 



Cg n 



R 



n 



2n 



log 



Cg 



Cg n-1 



(21) 
(22) 



Proof: See B Thm. 1]. ■ 

Theorem 2: Let X G M" be an i.i.d. A/'(0, cr^) vector, and let A be a lattice in M"~^ with normalized second 
moment G(A). Suppose X is quantized by an n-dimensional shape-gain VQ at rate R = Rg + Rg with gain- 
dependent shape codebook constructed from A with different minimum distances. Also, assume that a variable -rate 
coding follows the quantization. Then, the asymptotic decay of the minimum mean-squared error D is given by 



lim D2'^^ 



n 



_ ^l/rtz—fl— l/n 
(^_l)l-l/n -^9 ^« 



(23) 



and is achieved by Rg = R* and R„ = R* where i?* and Rg = R* are given in (1211) and (1221) . 



Cg 



n 



n 



G(A)f27r"/VrW2) 



2/(n-l) 



a 



gn/2p3|- n+2 > 

8nr(n/2) 



and 'tp{-) is the digamma function. 
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Proof: We first minimize Dg for a given gain codebook {(jj^j^i- From ( |20b . ignoring the constant C, we must 
perform the minimization 

min ^/^i pj ^2 uf'^^-'''^ subject to ^/^^ log = nRs. (24) 

Mi,...,Mj J J J J 

Using a Lagrange multiplier to get an unconstrained problem, we obtain the objective function 

/ = E/=i P. 5| Mf - A E/=i log M, . 
Neglecting the integer constraint, we can take the partial derivatives 

Setting ^ = 0, l<j<J, yields 

M, = [A(l-n)/(25|)]('-")/^ (25) 

Substituting into the constraint (l24l ). we get 

, r i(i-")/2 

E/=i l^i log - n)/{2gj)\ = nRs. 

Thus, 

[A(l — n)/2]*'^^"-'/2 = 2"-^=~(""-'-)^'^=iP''^°s^'' = 2"-^='~("~^)-^[^°S9]_ 
Therefore, it follows from (l25l) that the optimal size for the jth shape subcodebook for a given gain codebook is 

Mj = g'^-^ ■ 2"«:-("-i)^[i°S9], 1 < j < J. (26) 

The resulting shape distortion is 

where C is the same constant as specified in (l20l) . Hence, 

lim Ds ■ 22(«/(-i))fi: = c • lim 22^['°ss] ^ • 22^[i°g9] C ■ 2a2e^("/2) = C„ (27) 

where (a) follows from the high-rate assumption; and (b) follows from computing the expectation E[logg]. On the 
other hand, it is shown in Q Thm. 1] that 

lim Dg ■ 22"(«-«:) = lim Dg ■ 22"^« = Cg- (28) 

R—>oo >oo 

The limits (|27] ) and (1281 ) now allow us to apply Lemma [T] to obtain the desired result. ■ 
Through this theorem we can verify the rate-distortion improvement as compared to independent shape-gain 
encoding by comparing Cg and Cs in the distortion formula to the analogous quantities in IT) Thm. I]- Cg remains 
the same whereas Cs, which plays a more significant role in the distortion formula, is scaled by a factor of 
2e'^("/2)/n < 1. In particular, the improvement in signal-to-quantization noise ratio achieved by the WSC with 
gain-dependent shape codebook is given by 

AsNR (in dB) = -10(1 - l/n) log^^i^e^^""^^^ /n). (29) 

From the theory of the gamma function \23. Eq. 29], we know that, for s G C, 

lim [V'(s) - ln(s)] = 0. 

|s|— ^-oo 

It follows that [^(n/2) — ln(n/2)] — )• 0, and thus Asnr('i^) — )• 0, as n — )• oo; this is not surprising because of the 
"sphere hardening" effect. This improvement is plotted in Figure |2] as a function of block length n in the range 
between 5 and 50. 
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15 20 25 30 35 40 45 

Block Length 



50 



Fig. 2. Improvement in signal-to-quantization noise ratio of WSC with gain-dependent shape quantizer specified in 
the asymptotic rate-distortion performance given in fTl Thm. 1] 



}, as compared to 



2) Fixed-Rate Coding: A similar optimal rate allocation is possible for fixed-rate coding. 

Theorem 3: Let X £ R"- he an i.i.d. M{0,a'^) vector, and let A be a lattice in M^~^ with normalized second 
moment G(A). Suppose X is quantized by an n-dimensional shape-gain VQ at rate R with gain-dependent shape 
codebook constructed from A with different minimum distances. Also, assume that J gain codewords are used and 
that a fixed-rate coding follows the quantization. Then, the optimal number of codewords in each subcodebook is 

(n-l)/{n+l) 



Mi 



1 < J < J, 



(30) 



where {gi,g2, ■ ■ ■ ,g,j} is the optimal gain codebook. The resulting asymptotic decay of the shape distortion Dg is 
given by 



lim i^^22("/('^-i))« = C ■ 



71-1- 1 

n-l 



(31) 



where pj is probability of gj being chosen and C is the same constant as given in ([201 ). 

Proof: For a given gain codebook {gj}j^^, the optimal subcodebook sizes are given by the optimization 



2/(1-") 



subject to Z]j=i = 2 



nR 



(32) 



Mi,...,Mj 

Similarly to the variable -rate case, we can use a Lagrange multiplier to obtain an unconstrained optimization with 
the objective function 

Again, assuming high rate, we can ignore the integer constraints on Mj to take partial derivatives. Setting them 
equal to zero, one can obtain 

,2M{l-n)/(«+l) 



Mj=[\{l-n)/{2pjg])y 
Substituting into the constraint (l32l ) yields 



(33) 



E;=i X{l-n)/{2p,gl 



(l-n)/(n+l) 



mR 



Hence, 



-nR 



N (l-„)/(„+l) 



(34) 
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2 2.5 3 3.5 4 



Rate 

Fig. 3. High-resolution approximation of the rate-distortion performance of WSC with gain-dependent shape codebooks and fixed-rate 
coding for an i.i.d. A/'(0, 1) source with bloclc length n = 25. 



Combining (134] ) and (1331 ) give us 

M, = a(i-")/("+i) 



1 — n 



(l-n)/(n+l) 



(n-l)/{n+l) 



2N(n-l)/(n+l)' 



With the high-rate assumption, the resulting shape distortion will be 



2/(l-n) 



(J _ 2-2{n/{n-l))R 



where C = ^G(A)(2WVr(n/2))^^^" completing the proof. ■ 
Figure [3] illustrates the resulting performance as a function of the rate for several values of J. As expected, for 
a fixed block size n, higher rates require higher values of J (more concentric spheres) to attain good performance, 
and the best performance is improved by increasing the maximum value for J. 



l<j<J. 



2/(l-n) 



(35) 



D. Using WSC Rate Allocation for Permutation Codes 

In this section we use the optimal rate allocations for WSC to guide the design of CPCs at a given rate. The 
rate allocations are used to set target sizes for each subcodebook. Then for each subcodebook Cj, a composition 
meeting the constraint on Mj is selected (using heuristics inspired by Conjecture O. Algorithm [T] of Section IV-AI 
is then used for those compositions to compute the actual rate and distortion. 

For the variable-rate case. Theorem |2] provides the key rate allocation step in the design procedure given in 
Algorithm |2] Similarly, Theorem [3] leads to the design procedure for the fixed-rate case given in Algorithm [3] Each 
case requires as input not only the rate R but also the number of initial codewords J. 

Results for the fixed-rate case are plotted in Figure ID This demonstrates that using the rate allocation of WSC 
with gain-dependent shape codebook actually yields good CPCs for most of the rates. Figure |5] demonstrates the 
improvement that comes with allowing more initial codewords. The distortion is again computed empirically from 
Gaussian samples. It has a qualitative similarity with Figure |3] 
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Algorithm 2 Design Algorithm for Variable-Rate Case 

1) Compute R* and i?* from ([IT]) and ([221), respectively. 

2) For I < j < J, compute Mj from (l26l) . 

3) For 1 < i < J, search through all possible compositions of n that satisfy Conjecture |2j choosing 
the one that produces the number of codewords closest to Mj. 

4) Run Algorithm [1] for the J compositions chosen in step 4 to generate the initial codewords and to 
compute the actual rate and distortion. 



Algorithm 3 Design Algorithm for Fixed-Rate Case 



1) Use the scalar Lloyd-Max algorithm to optimize J gain codewords. 

2) For 1 < j < J, compute Mj from dSOjl 

3) Repeat steps 3 and 4 of Algorithm |2l 



VI. Conclusions 

We have studied a generalization of permutation codes in which more than one initial codeword is allowed. 
This improves rate-distortion performance while adding very little to encoding complexity. However, the design 
complexity is increased considerably. To reduce the design complexity, we explore a method introduced by Lu 
et al. of restricting the subcodebooks to share a common composition; and we introduce a method of allocating 
rates across subcodebooks using high-resolution analysis of wrapped spherical codes. Simulations suggest that these 
heuristics are effective, but obtaining theoretical guarantees remains an open problem. 
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Appendix A 
Proof of Proposition [2] 

Consider a new composition {n'^,n2, . . . ,n'p^} obtained by swapping and rim+i, i.e., 

i ^ m or m + 1; 



rii, 



i = m + 1. 



Let {![} denote groups of indices generated by composition {n'.}. Suppose that D is the optimal distortion associated 
with {rii}. 



D = n-^E 



where {/^^} is the optimum of the minimization of the right side over ilm- Consider a suboptimal distortion D' 
associated with {n'J, 

D' = n-^E 



where {fij} is constructed from {fij} as follows: for each j, 



2n„/j^+(n,„+i-n,„)/x^„^i 



i 7^ m or m + 1; 
i = m; 
i = m + 1. 



(36) 
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Fig. 4. Rate-distortion performance of fixed-rate CPCs designed with compositions guided by the WSC rate allocation and Algorithm [T] in 
comparison with codes designed with exhaustive search over a heuristic subset of compositions guided by Conjecture |2] Computation uses 
i.i.d. A/'(0, 1) source, n = 7, and J — 3. 
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Fig. 5. Rate-distortion performance for variable-rate coding of i.i.d. A/'(0, 1) source with block length n — 25. CPCs with different 
compositions are designed using rate allocations from Theorem [5] and initial codewords locally optimized by Algorithm [T] The rate allocation 
computation assumes G(A24) ~ 0.065771 t24j p. 61]. 



Note that, for the above construction, we have 



Mm+1' fo'' i £ {1, 2, ... , J}. Therefore 



{fil} also satisfies Q,m, and so forms a valid codebook corresponding to composition {n'^}. Thus, it will be sufficient 
if we can show D > D'. On the other hand, it is easy to verify that, for all j, 



\2 



Hence, 



K 



K 



Y^^iJ^ = Y.n,{,j^ , for all J. 



i=l 



i=l 



(37) 
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Now consider the difference between D and D': 

K 



A = n{D -D') = E 



mm 

i 



i=i ieii 



mm 

j 



K 

EE 



m - H 



(a) 

> E 



K 



1=1 



eel' 



ib) 



2E 



L+r 



L+r+q 



i=L+l 



=L+r+l 



L+q L+q+r 
l=L+l e=L+q+l 



where (a) uses the fact that min/ — min^i > min{/ — g}, for arbitrary functions f,g; and (b) follows from (|37] ) 
in which q = Um, r = rim+i, and L = ni + n2 + • • • + Um-i- Now using the formulae of fim and in ( l36l l. 

we obtain 



A > 2^; 



mm 

3 



q + r 



Y 

?=L+1 



_|_ J, 



+ 

2r(q — r 



g + r 
(a) 2r(g - r) 



■E 



q + r 
m{(/^^ -^^+i)C} 



mm 



+ r 



C+ • min <! - 



i} -C- •max|/x^-/x^+;^| 



(b) 

> 0, 



where is the random variable specified in ([T5] ): (a) follows from the total expectation theorem; and (b) follows 
from constraint 0^ and that q > r. The nonnegativity of A has proved the proposition. 



Appendix B 
The Number and Density of Distinct Rates 

In this appendix, we discuss the distinct rate points at which fixed-rate ordinary PCs, CPCs with common 
compositionss, and CPCs with possibly-different compositionss may operate. For brevity, we restrict attention to 
Variant I codes. 

The number of codewords (and therefore the rate) for an ordinary PC is determined by the multinomial coefficient 
The multinomial coefficient is invariant to the order of the n^s, and so we are interested in the number of 
unordered compositionss (or integer partitions) of n, P{n). Hardy and Ramanujan [25J gave the asymptotic formula: 
P(n) ~ '^4^J^^ ■ might think that the number of possible distinct rate points is P{n), but different sets of {rij} 
can yield the same multinomial coefficient. For example, at n = 7 both (3, 2, 2) and (4, 1, 1, 1) yield M = 210. 
Thus we are instead interested in the number of distinct multinomial coefficients, A^muit('^) lf26il . jlTl A070289]. 
Clearly P{n) > A'muitl'^)- A lower bound to A^muitC'^) is the number of unordered partitions of n into parts that 
are prime, Pp(n), with asymptotic formula: Pp(n) ~ exp|--^^=|. Thus the number of distinct rate points for 
ordinary PCs grows exponentially with block length. 

It follows easily that the average density of distinct rate points on the interval of possible rates grows without 
bound. Denote this average density by 6{n). The interval of possible rates is [0, log n!/n], so applying the upper 
and lower bounds gives the asymptotic expression 

^ OlTll ^ 

logn! - V ; ^ 4^/3 log n! 
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TABLE I 
Number of Rate Points 



n J= 1 



J = 2 



J = 3 



J = 4 



2 2 

3 3 

4 5 

5 7 

6 11 

7 14 

8 20 

9 27 



3 
6 
15 
27 
60 
97 
186 
335 



10 
33 
68 
207 
415 
1038 
2440 



4 



1202 
3888 
11911 



5 
15 
56 
132 
517 



Taking the limits of the bounds then yields lim„_!.oo S{n) = +oo. 

The following proposition addresses the maximum gap between rate points, giving a result stronger than the 
statement on average density. 

Proposition 3: The maximum spacing between any pair of rate points goes to as n — )■ cxd. 



Proof: First note that there are rate points at 0, ^HlM ^ iog[(")(" i)] ^ iog[(n)(ra i)(n 2)] ^ induced by integer 



partitions (n), (n — 1, 1), (n — 2, 1, 1), (n — 3, 1, 1, 1), ... The lengths of the intervals between these rate points 



A^muit(n) is the number of distinct rate points for ordinary PCs. If fixed-rate CPCs are restricted to have common 
compositions, then they too have the same number of distinct rate points. If different compositions are allowed, 
the number of distinct rate points may increase dramatically. 

Recall the rate expression (ITTI ). and notice that distinct values of ^ Mj will yield distinct rate points. Somewhat 
similarly to the distinct subset sum problem |[28l pp. 174-175], we want to see how many distinct sums are 
obtainable from subsets of size J selected with replacement from the possible multinomial coefficients of a given 
block length n. This set is denoted M{n) and satisfies |A^(n)| = iVmuit(?^); for example, M{4) = {1, 4, 6, 12, 24}. 

For a general set of integers of size A^muit(f^)> the number of distinct subset sums is upper-bounded by 
This is achieved, for example, by the set {1^, 2^, . . . , A'^tnuit(f^)^"""^"^}- The number of distinct subset sums, 
however, can be much smaller. For example, for the set {1,2,..., Nj^u\t{n)}, this number is J Nj^u\t{n) — J + 1. 
We have been unable to obtain a general expression for the set A4 (n) ; this seems to be a difficult number theoretic 
problem. It can be noted, however, that this number may be much larger than A'n^uit(n). 

Exact computations for the number of distinct rate points at small values of n and J are provided in Table U 
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